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Introduction 

Evolutionists offer a number of 
arguments in support of common 
descent, i.e., the claim that all life 
descended from a common 
ancestor. One of the more 
prominent arguments, an argument 
that figures heavily in lay 
conversations and the public 
consciousness writ large, is as 
follows: When we look across the 
animal world and organisms more 
broadly, we find that there are 
"striking genetic similarities" 
between species with otherwise 
distinct phenotypes (i.e., distinct 
observable characteristics). For 
example, despite significant 
differences in each organism's 
external traits, chimpanzee and 
human genomes are approximately 
99% identical 1 (or 98%, 2 96%, 3 
94% 4 depending on the research 
study). 

Popular scientists, like Richard 
Dawkins, often employ claims of 
human-chimp genetic similarity to 
further arguments about common 
descent and to oppose the notion 
of human exceptionalism and the 
Christian and Islamic narrative of 
human origins from a created 
ancestor, namely Adam. 5 


What is often left unpresented to 
the non-specialist public are the 
details and distinctive nature of this 
"striking" genetic similarity so often 
touted by public intellectuals and 
scientific reporting alike. 6 

Taking a closer look at the 

scientific literature provides further 
information that puts these 
similarity claims into proper 

context. What is apparent is 
that the conclusion of 99% 
similarity is an oversimplification, 
and the scientific conclusions 
drawn are much more cautious 

and, by themselves, much less 
definitive of common descent than 
is often assumed in the public 
discourse. 

"Popular" Science 

When one first hears genetic 

similarity arguments, it is difficult 
not to be completely taken in by 
them. How can anyone argue with 
99%? Upon actually delving into 
the literature, however, one quickly 
realizes that the issue is not as 
straightforward as might first 
appear. For example, Chris Moran, 
professor of animal genetics at the 
University of Sydney, remarks: 

Depending upon what it is 

that you are comparing you 
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can say 'Yes, there's a very 
high degree of similarity, for 
example, between a human 
and a pig protein coding 
sequencebut if you 
compare rapidly evolving 
non-coding sequences from 
a similar location in the 
genome, you may not be 
able to recognise any 
similarity at all. This means 
that blanket comparisons of 
all DNA sequences between 
species are not very 
meaningful . 7 

Unfortunately, what many fail to 
understand is that what is found in 
scientific literature and what is 
reported to the lay public are 
sometimes worlds apart, especially 
when the issue is as ideologically 
charged as human origins. 
Complex scientific work gets 
distilled into soundbites for mass 
consumption. This is not a problem 
in itself, but when that filtering 
process is molded by an 
ideological narrative such as "cold, 
hard science vs. irrational Bible 
thumping," then that is where 
simplifications should be 
reexamined. 

With that in mind, what does the 
scientific literature have to say? 
What we will find is that 
comparing two genomes is far from 


a trivial task. Specifically, a review 
of the major papers on the topic 
reveals: 

• All of them assume common 
descent as axiomatic and 
beyond question. In other 
words, none of the geneticists 
researching human-chimp 
genetic similarity are attempting 
to prove or provide systematic 
argumentation for common 
descent by way of tallying 
matching nucleotides between 
two genomes. This is contrary to 
the popular perception that 99% 
similarity is an argument, in 
itself, for common descent. 

• No research study has attempted 
to compare 100% of the human 
and chimp genomes in order to 
determine an overall percent 
similarity. Each study limits its 
comparison to subsections of the 
genome, and, in some studies, 
including the landmark 1975 
paper that first claimed to have 
discovered 99% similarity, the 
compared regions constituted 
less than 2% of the total 
genome. 8 

• There is no single agreed upon 
or widely used metric by which 
to quantify the similarity of two 
genomes. In fact, each paper on 
the topic uses a different method 
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and different parameters in 
selecting and parsing the 
relevant data. 

• Many of the key assumptions the 
major chimp-human genome 
research papers made in 
determining 99% similarity have 
since proved to be erroneous. 

Comparative Metrics 

99% of lab mice genes have direct 
human counterparts, 9 and 80% of 
human genes overlap with those of 
mice. 10 90% of human-cat genes 
match, 11 and 94% of dog-cat genes 
match. 12 There is 60% overlap 
between human and fruit fly genes 
and 31 % overlap between human 
and yeast genes. 13 

Is 99% human-chimp genome 
similarity less impressive in light of 
the fact that domestic cats share 
90% of their genes with humans 
and yeast share over 30% of their 
genes with us, etc.? What should 
we make of these various 
quantitative comparisons? In 
reality, it is difficult to make sense 
of these percentages without a 
uniform metric to reference. 
Unfortunately, the biological 
sciences do not provide one. 


We must keep in mind that, as of 
2014, the gene sequencing that 
allows for these kinds of 
comparisons has only been done 
for a limited number of organisms 
(cats, dogs, mice, rats, cows, 
several great apes, fruit flies, yeast, 
certain bacteria, etc.) and even 
then, the genomes of very few 
species have 

been completely 14 sequenced. 15 For 
those that have been completely 
sequenced, only a few have been 
directly compared with the human 
genome, such as those of the great 
apes. So, evolutionary biologists 
can neither give a robust nor an 
exact range of similarity, for 
example, for all mammals, or 
mammals vs. reptiles vs. fish, or 
vertebrates vs. invertebrates, or 
plants vs. animals, etc. This is 
important because, what if all 
vertebrates or all mammals fall 
within an 80%-99% range of 
genetic similarity to each other? If 
we knew that range, we could 
make truly comparative statements 
like, chimp-human genes overlap, 
say, 50% more than the average 
degree of overlap between any two 
other mammalian species. 
Comparisons like that might at least 
have a chance at being minimally 
probative. 

The logic here is that 
we should expect a high degree of 
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gene overlap between organisms 
that are anatomically similar. This 
is because, in the most basic sense, 
an organism's phenotype is an 
expression of its genotype. 
Therefore, similarities between 
phenotypes should translate into 
similarities in genotypes to at least 
some degree. For example, cats, 
dogs, chimps, mice, and humans 
all have similar circulatory systems, 
gastrointestinal systems, respiratory 
systems, reproductive systems, 
immune systems, metabolic 
systems, and too many other 
parallels to list. Given this, what 
percentage of the genotypes should 
we expect to overlap simply due to 
all the major phenotypic parallels 
we observe between two or more 
organisms? As a rough benchmark, 
just look at how phenotypically 
divergent humans and fruit flies 
are, yet a whopping 60% of our 
genes overlap! 

As a simple analogy, we would not 
be too incredulous if it were 
claimed that the technology in an 
Apple iPhone and a Samsung 
Galaxy are 99% similar. They are 
both smartphones of a similar size 
with similar functionality: making 
calls, connecting to the internet, 
supporting applications. There is 
going to be a high degree of 
overlap just because these 
functions require essentially the 


same hardware: microprocessors, 
wifi modules, cameras, 

touchscreens, mics, speakers, etc. 
Thus, the claim that the iPhone and 
the Galaxy are 99% percent alike 
would not mean much, especially 
if it turns out that an iPhone and a 
breadmaker are 60% alike. But if it 
were claimed that the iPhone and 
Galaxy are 50% more similar than 
the average similarity between any 
two smartphones, then that would 
imply something significant and 
unobvious, e.g., either Apple or 
Samsung is stealing the other's 
phone design. 

In other words, when it comes to 
human-chimp similarity, is the 99% 
indicative of something significant 
about the relation between chimps 
and humans or is the 99% simply 
riding on the particulars of the 
comparison scheme the researchers 
chose in determining that figure ? 
This question is especially crucial 
given the complex and input- 
sensitive algorithmic methods used 
to actually compare two DNA 
sequences. 

Ultimately, genetics and the 
biological sciences generally do 
not offer an objective yardstick by 
which to measure the similarity of 
two genomes and, in general, there 
is no straightforward or standard 
way to give a percent similarity 



5 | Can Islam Object to Evolution? Evaluating Human-Chimp Genetic Similarity 


between two multidimensional 
objects. For example, what is the 
percent similarity between an apple 
and an orange? Well, given that 
there are countless ways to 
compare the two, a meaningful 
answer will have to be 
benchmarked against how similar, 
on average, we deem other fruits to 
be to each other, for example. 

All in all, the lack of a frame of 
reference to normalize comparative 
data renders the 99% similarity 
factoid essentially meaningless. 
Multiple renowned research 
geneticists quoted in Science's, 
"The Myth of 1%", concur in this 
seemingly stark assessment: 

Researchers are finding that 
on top of the 7 % 
distinction, chunks of 
missing DNA, extra genes, 
altered connections in gene 
networks, and the very 
structure of chromosomes 
confound any quantification 
of "humanness" versus 
"chimpness." [...] There 

isn't one single way to 
express the genetic distance 
between two complicated 
living organisms. [...] Could 
researchers combine all of 
what's known and come up 
with a precise percentage 
difference between humans 
and chimpanzees? "I don't 


think there's any way to 
calculate a number," says 
geneticist Svante Paabo, a 
Chimp Consortium member 
based at the Max Planck 
Institute for Evolutionary 
Anthropology in Leipzig, 
Germany. "In the end, it's a 
political and social and 
cultural thing about how we 
see our differences. " 16 

Remarkable 

Divergence 

Chimpanzee and human Y 
chromosomes are remarkably 
divergent in structure and gene 
content. 17 

That is the title of a prominent 
2010 research paper that adds 
another dimension to human- 
chimp genetic comparisons. 
Hughes, et al., found that the 
chimpanzee Y-chromosome has 
only 47% as many protein-coding 
elements and only two-thirds as 
many distinct genes as the human 
Y-chromosome. Also, more than 
30% of the chimp Y-chromosome 
lacks a counterpart on the human 
Y-chromosome and vice versa (see 
Figure 1). In one part of the paper, 
the authors even state: "The 
difference in MSY gene content in 
chimpanzee and human is more 
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comparable to the difference in 
autosomal gene content in chicken 
and human." 

What is truly telling is that Hughes, 
et al., recreated the DNA 
comparison of other studies in 
order to benchmark their alignment 
techniques. 

As expected , we found that the 
degree of similarity between 
orthologous chimpanzee and 
human MSY sequences (98.3% 
nucleotide identity) differs only 
modestly from that reported when 
comparing the rest of the 
chimpanzee and human genomes 
(98.8%). 

This means that "remarkable 
divergence" exists despite the 98% 
sequence similarity in the Y- 
chromosome, implying that the rest 
of the genome may also contain 
major disparities even though 
sequence similarity is determined 
to be 99%, 98%, or 95%. 

This is not the first example of 
divergence that geneticists have 
uncovered between human and 
great ape genetic sequences. The 
field was privy to Y-chromosome 
misalignments as far back as 
1998. 18 Chromosome 4, 9, 12, 19 
and, particularly, 21 have also 
been found to contain "large, non- 
random regions of difference." 
20 Interestingly, these discrepancies 


are usually investigated and 
emphasized in research seeking to 
discover the genetic secret to 
"humanness," namely what makes 
us characteristically human as 
opposed to mere chimp. 

a 

Chimpanzee Y 

* ri if in i i hi ii * 

- 1 - 

Human Y MSY 

* m mi m i an hi i //. 



a. Schematic representations of chromosomes, ccn, centromere; Yp, short 
arm; Yq, long arm. For both chromosomes, the MSY is indicated. Six 
sequence classes are shown, four of which are MSY euchromatin. (‘Other’ 
denotes MSY single-copy sequences that are not X-dcgenerate or 
X-transposcd.) Chromosomes arc drawn to scale, with the exception of the 
large hctcrochromatic block on human Yq. b, Sizes (in Mb) of four MSY 
euchromatin sequence classes in chimpanzee and human, c, Percentages of 
ampliconic and X-dcgencrate sequences present on chimpanzee Y 
chromosome that arc also present on human Y chromosome, and vice versa. 

Figure I 

Discrepancies are also emphasized 
in phylogenetics, i.e., the genetic 
analysis used to determine how 
different organisms are related on 
the evolutionary tree. For example, 
in 2007 Ebersberger, et al., claim: 

For about 23% of our 
genome, we share no 
immediate genetic ancestry 
with our closest living 
relative, the chimpanzee. 

[...] Fhus, in two-thirds of 
the cases a genealogy results 
in which humans and 
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chimpanzees are not each 
other's closest genetic 
relatives. The corresponding 
genealogies are incongruent 
with the species tree. In 
accordance with the 
experimental evidences , this 
implies that there is no such 
thing as a unique 
evolutionary history of the 
human genome. Rather , it 
resembles a patchwork of 
individual regions following 
their own genealogy. 21 


One might ask, why have these in 
depth chromosomal studies, like 
the one from Hughes, et ah, not 
been conducted for all ape 
chromosomes? The chromosomes 
of rodents and fruit flies, for 
example, are known in great detail, 
and the reason is those organisms 
can be experimented on endlessly 
in medical research. Not so with 
apes. Ethical standards and animal 
conservation regulations disallow 
invasive and terminal 

experimentation on apes. For this 
reason, funding for ape 
chromosome research is relatively 
sparse because, in the end, there 
are few opportunities of 
applications for any findings. Why 
waste millions of dollars in funding 
on delving into ape chromosomes 
when, afterward, one is not 
allowed, due to ethical standards, 


to use those findings to further 
medical science through genetic 
modification and experimentation 
on living apes? 

In any case, given these known 
chromosomal and phylogenetic 
discrepancies across multiple 
regions of the human-chimp 
genome map, what are we to make 
of the 99% human-chimp similarity 
claim? 

The answer lies in the details of the 
methodologies geneticists use to 
sequence and align the human and 
chimpanzee genome. For example, 
since humans have 46 
chromosomes compared to 48 in 
chimps, is that not a 4.2% 
difference right off the bat? 
Obviously, that is deliberately 
simplistic. But the point is that 
comparing the human and chimp 
genomes is not a simple matter of 
lining the two up and seeing how 
much they match, though that is 
precisely the impression a non¬ 
specialist may come away with. 

In fact, science and natural history 
museums with exhibits dedicated 
to evolution—e.g., the "Explore 
Evolution" 22 project that was 
featured at numerous natural 
history museums across the US— 
often relay 23 this simplistic and 
ultimately inaccurate notion of 
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genetic similarity to the public by 
printing a few thousand aligned 
nucleotides from each genome 
onto posters side by side, as if to 
imply that human-chimp genetic 
overlap is as plain as clear day (see 
Figures 2, 3, and 4). Just open your 
eyes and see! 



Figure 3 


G 
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Figure 4 


King and Wilson's 
99 % 

Let's dig into the details of gene 
sequencing and comparison. The 
initial research claiming 99% 
similarity came in 1975 from King 
and Wilson, who used three 
biochemical methods to indirectly 
measure genetic overlap by 
examining select human and chimp 
proteins. One important note is that 
King and Wilson were not setting 
out to prove that human and chimp 
genetics highly overlap. Actually, 
this was a surprising result for 
them, and they concluded: 

Fhe intriguing result , 
documented in this article, 
is that all the biochemical 
methods agree in showing 
that the genetic distance 
between humans and the 
chimpanzee is probably too 
small to account for their 
substantial organismal 

differences. 24 

Of course, what was filtered down 
to the public (and what 
was interpreted later by many in 
the scientific community) was that 
King and Wilson's research 
provided prime evidence for 
common descent. 25 It is interesting 
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that King and Wilson themselves 
felt that the discovered genetic 
similarity belied the vast 
divergence between the two 
species, so much so that the role of 
genetic sequence as the primary 
determinant of an organism's 
phenotype was questioned. 

Much Ado About 

2 % 

Besides this point, let's also look 
more closely at King and Wilson's 
research methods. The first thing to 
note is that, due to the 
technological limits of the time, 
their methods focused on an 
analysis of human and chimp 
proteins and not the actual 
genome. Even then, they only 
compared a handful of homologous 
proteins as those are the most 
readily comparable. Nowhere is it 
claimed that the selected proteins 
are representative of the vast 
variety of proteins in both human 
and chimp bodies. In fact, King 
explicitly caveats: 

Owing to the limitations of 
conventional sequencing 
methods , exactly 

comparable information is 
not available for larger 
proteins. Indeed , the 
sequence information 


available for the proteins 
already mentioned [in this 
paper] is not yet complete. 

Beyond these gaps, what is more 
significant is that at most, proteins 
only reflect the coding portion of 
the genome while non-coding 
areas of the genome are completely 
missed. Interestingly, 98% of the 
human genome is non-coding. 26 

What is the difference between the 
coding and non-coding regions of 
DNA? As it is commonly put, DNA 
carries the genetic instructions used 
in the development and function of 
an organism's biology. The 
mechanics of how these 
instructions are implemented is 
quite complex and not fully known, 
but, to put it simply, the coding 
portion of DNA encodes the 
various proteins which serve as the 
fundamental building blocks of 
bodily function. In humans, less 
than 2% of all DNA is associated 
with this coding process. 

For decades, biologists have 
insisted that the non-coding regions 
of the genome, which constitute 
over 98% of our DNA, is simply 
"junk." 27 They reasoned that, since 
non-coding regions played no 
discernible part in the formation of 
proteins, these regions had no 
biological function. This 
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assumption, of course, has colored 
all subsequent research on human- 
chimp genetic overlap. 

For King and Wilson's iconic 
paper, the fact that their 
comparison only focused on 
coding elements of the genome 
means that the 99% similarity they 
found is inapplicable to the vast 
majority—over 98%—of total 
human-chimp genetic material. 

Salacious Headlines 

Even if scientific consensus agrees 
that non-coding regions of the 
genome play no biological 
function, it would be a 
misinterpretation to state that 
human-chimp DNA is 99% similar 
based on King and Wilson's work. 
As far as King and Wilson are 
concerned, it would be more 
accurate to claim, e.g., "Human 
and chimp DNA is 99% similar... 
in the 2% of the genome that has 
been compared." Of course, a 
headline along those lines would 
not attract much attention much 
less strike anyone as an earth- 
shattering result. 

To make matters worse, the 99% 
similarity claim of King and Wilson 
is even less significant once it 
became apparent 28 that "junk" non¬ 


coding DNA is not as biologically 
useless 29 as previously assumed. 
More recently, geneticists are 
claiming that as much as 80% of 
non-coding DNA is 

biomechanically active. 30 And, 
even more strikingly, they are 
discovering how non-coding DNA 
plays an essential role in regulating 
crucial genetic processes. In other 
words, what was up until as 
recently as 2010 assumed to be 
"junk" and was, for the most part, 
disregarded in comparisons 
between human-chimp genetics is 
now understood by biologists to be 
a critical component of our 
genotypes. As Ewan Birney of the 
European Bioinformatics Institute 
tellingly puts it: "What is 
remarkable is how much of [the 
genome] is doing at least 
something. It has changed my 
perception of the genome." 31 Go 
figure! 98% of our genome is 
"doing at least something" and is 
not completely inert waste. 

Given the very selective and 
limited human-chimp genome 
comparisons that have been done 
by King, Wilson, and others, it is no 
surprise that the more focused 
studies that analyze specific 
chromosomes in detail, such as the 
Y-chromosome study cited above, 
find "remarkable divergences." 
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Other Studies 

A review of human-chimp genome 
comparisons since King and 
Wilson's paper shows many of 
them base their conclusions 
exclusively on the coding portion 
of the genome, which only 
accounts for 2% of the entire 
genome (e.g., Wildman, et 
al., 32 Nielsen, et al. 33 ). The rest of 
the literature—all of which 
predates the 2010 research on the 
importance of non-coding 
regions—includes both coding and 
non-coding portions to varying 
extents (though non-coding regions 
are generally underemphasized). 
Nonetheless, these studies limit 
their comparison to some portion 
of the total genome, meaning there 
is no, as it were, end-to-end 
comparison of the entirety of the 
human and chimp genomic 
sequences. 

This, of course, is a given for any 
research that predated the 
completion of the Human Genome 
Project (HGP) in 2003 and the 
chimp draft genome of the 2005 
Chimp Consortium. 34 Obviously, 
no one could provide a 
comprehensive comparison of the 
entirety of the two genomes prior to 
them being (nearly) fully 
sequenced, in 2003 and 2005, 


respectively. (And even the chimp 
genome sequence is a draft. More 
on that later.) 

For example, Britten 35 in 2002 only 
compared 846,016 bases out of the 
total roughly 3.08 billion 36 that 
constitute the human genome, 
which is just 0.03% of the 
total. Arnason, et al., six years 
prior, had only considered 
165,000, which is 0.006%. 37 Liu, et 
al., in 2003, compared nearly 5 
million, which is 0.17% of the 
total. 38 Ebersberger, et al., in 2002, 
compared about 3 million, which is 
0.1 %. 39 Anzai, et al., specifically 
looked at the MHC multigene 
region of the genome, which is 
associated with the immune 
response of vertebrates; in total, it 
constitutes 0.06% of the 
genome. 40 Thomas, et al., 41 
considered 0.06% in 2003 
and Nielsen, et al., 42 considered 
0.6% in 2005. 

The only study to take into account 
a sizable majority of the human 
and chimp genomes was the 2005 
Chimpanzee Sequencing and 
Analysis Consortium, which 
compared 2.3 billion nucleotides, 
i.e., approximately 76.7% of the 
total. 43 

In truth, none of these studies 
unqualifiedly claim 99% similarity 
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between human and chimp 
genomes. Rather, the caveat is 
always there (sometimes more 
explicitly, sometimes less) that the 
95%, 98%, or 99% similarity 

discovered is limited to 
the partial segments of the genome 
aligned. 

Draft Sequences 

Now let's dig deeper into modern 
sequencing and genome 
comparison techniques in order to 
get more insight into the findings of 
the 2005 Chimp Consortium, 
which came closest to comparing 
the entirety of the human-chimp 
genetic sequence. 

Prior to actually comparing DNA, 
geneticists have to first sequence 
the genomes in question, which is 
in itself a monumental task. As 
noted above, only a handful of 
species' genomes has been 
completely sequenced. This is 
because sequencing projects can 
be expensive. The International 
Human Genome Project (HGP), for 
example, required $3 billion in 
funding and took approximately 13 
years to complete. The 2005 
Chimpanzee Sequencing and 
Analysis Consortium, in contrast, 
did not attempt to sequence the 
chimp genome to the same level of 


rigor as the HGP and only ended 
up covering 94% of the entirety of 
the genome. 44 Rather than 
sequence the chimp genome all the 
way to completion, researchers 
used the human genome as a 
"blueprint" to assemble isolated 
fragments of sequenced chimp 
DNA. This was done under the 
assumption that humans and 
chimps are closely related, such 
that the human genome can be 
used as a reference to map the 
fragmented chimp DNA. The 
overly cynical among us might be 
tempted to think that the fact that 
the human genome was utilized to 
sequence the chimp genome would 
have important implications for 
later comparisons of the two. 

Selective 

Comparison 

The impression the lay public 
might get from unqualified claims 
of 99% human-chimp similarity is 
that geneticists lined up the 
genomes and compared sequences 
of the billions of nucleotides 
constituting DNA structure, i.e., A, 
T, C, G. For example, here are the 
first 100 bases of chimp 
mitochondrial DNA: 
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GTTTATGTAGCTTACCCCCTCAA 

AGCAATACACTGAAAATGTTTCG 

ACGGGTTTACATCACCCCATAAA 

CAAACAGGTTTGGTCCTAGCCTT 

TCTATTAG 

And the first 100 for human 
mitochondrial DNA: 

GATCACAGGTCTATCACCCTATT 
AACCACTCACGGGAGCTCTCCAT 
GCATTTGGTATTTTCGTCTGGGG 
GGTGTGCACGCGATAGCATTGC 
GAGACGCTG 

Given that the entire human 
genome is on the order of three 
billion nucleotides and the chimp 
genome is roughly 10% larger, any 
notion of "direct" comparison is far 
beyond consideration. In fact, 
geneticists employ the help of 
statistical mathematicians and 
computer programmers to produce 
algorithms and software— 
e.g., BLAST—capable of finding 
alignments between massive 
sequences. 45 

Before employing software like 
BLAST, however, geneticists first 
pre-select regions of the genome 
they want to compare. This pre¬ 


selection is necessary because 
certain regions of the human and 
chimp genomes are too 
divergent to be effectively 
compared using local alignment 
algorithms. Regions that are highly 
repetitive are also excluded (or 
"masked") because BLAST and 
other programs would return 
"inaccurate" results were these 
regions to be included. (More on 
this in the next section.) Ultimately, 
the final percentage similarity does 
not encompass the excluded 
regions. In other words, genome 
comparison is a measure of 
similarity in sequences that are 
already similar enough to be 
aligned. 

Of course, this kind of limited 
analysis makes sense for 
researchers who compare genetic 
sequences between species 
ultimately in order to investigate 
shared genes in making strides in 
medical science. However, it is 
clear that this methodology is 
fundamentally limited in its ability 
to assay overall similarity in the 
entirety of two genomes. After all, 
regions of divergence beyond an 
arbitrarily specified limit are 
excluded out of hand. Outside of 
such examples, it is not clear what 
deeper scientific utility genome 
comparison has other than being 
an arbitrary, highly artificial 
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matching game for the purpose of 
reaffirming deeply rooted beliefs 
about the interrelation of humans 
and apes. 

To better appreciate this seemingly 
controversial statement, it helps to 
actually see the alignment 
methodologies in action and hear 
opinions from notable geneticists. 

The Sequence 
Alignment Problem 

A non-specialist may wonder how 
exactly two nucleotide sequences 
like the ones above are compared. 
The answer is, there is no one way 
to do this. In fact, sequence 
alignment is a very active field, 
as researchers debate which 

sequence alignment algorithms 
yield the most "reliable," "high- 
quality" results. 46 As Stanford 
Professor of Computer Science 
Serafim Batzoglou remarks: 

Recently, the literature on 
basic methodology and 
tools development has been 
growing rather than 

shrinking, indicating that the 
alignment problem is still 
not solved. How can that 
be, after nearly 40 years of 
research and literally 


hundreds of available 
tools? 47 

In actuality, the Sequence 
Alignment Problem is more of a 
mathematical problem than a 
biological one. Furthermore, it is an 
open problem as no definitive 
solution exists. 48 Nonetheless, the 
central problem is easily stated: 
Given two (or more) sequences of 
letters (e.g., A, C, T, G) of a given 
length, how can we quantify the 
"distance" or "similarity" between 
them? For example, consider the 
below sequences: 

TCCCAGTTATGTCAGCGGACACGAGCATGCA 

GAGAC 

AATTGCCGCCGTCGTTTTCAGCAGTTATGTCA 

GATC 

This is precisely the kind of data 
analyzed in the relatively new field 
of bioinformatics. Without applying 
constraints, there are exponentially 
many ways to align the two 
sequences (two possibilities shown 
below): 

—T—CC-C-AGT—TATGT-CAGGGGACACG—A-GCATGCAGA-GAC 

I II I II I I I III II I II I INI I 

AATTGCCGCC-GTCGT-T-TTCAG CA-GTTATG—T-CAGAT—C 

tccCAGTTATGTCAGgggacacgagcatgcagagac 

llllllllllll 

aattgccgccgtcgttttcagCAGTTATGTCAGatc 

Gaps, represented by dashes, are 
considered an acceptable method 
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to align the sequences because, in 
evolutionary terms, the gaps 
represent insertions or deletions 
(i.e., "indels") of nucleotides in the 
genetic sequence. Technically, 
single substitutions and inversions 
are also allowed, which further 
expands the space of possible 
alignments. 

Now, given the two possibilities 
above, which alignment is correct? 
Out of the large number of possible 
alignments for this relatively short 
sequence, how can we determine 
which alignment correctly 
represents the phylogenetic relation 
between the two species? After all, 
we do not have access to the 
hypothesized common ancestor's 
DNA to compare, contrast, and 
grade each possibility. Be that as it 
may, once we decide which 
alignment is correct, we can then 
tabulate the percentage similarity 
by counting matches. 

The significance of all this is 
that the overall percentage 
similarity of the two sequences 
ultimately depends on the 
alignment scheme one chooses. 
Furthermore, the lack of a 
standardized alignment scheme 
renders comparative studies across 
different genomes problematic and 
the results dubious. 


Large scale comparisons between 
genomes typically prefer "local 
alignment" as opposed to "global 
alignment." In the example above, 
the top alignment represents a 
global alignment and the bottom 
one, local. In comparative genomic 
studies like that of the 2005 Chimp 
Consortium, local alignment is 
preferred under the assumption 
that, given long stretches of DNA, 
only some portions are related in a 
sea of uninteresting nucleotide 
sequences. For example, in the 
local alignment above, this is 
considered 100% similarity. The 
non-aligned areas are simply 
disregarded. This is how alignment 
using BLAST works; the program 
takes a sample query of a given 
length and scans the database 
genome until it returns all possible 
matches, some of them of greater 
or lesser similarity due to indels, 
substitutions, etc. The relative 
location of the matches within the 
context of the whole genome is not 
factored because, again, the 
assumption is that the matching 
sequences are surrounded by 
insignificant regions whose exact 
order does not matter. As American 
biochemist Russell Doolittle notes: 
"The underlying message is that 
one must be alert to regions of 
similarity even when they occur 
embedded in an overall 
background of dissimilarity." 49 The 
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background dissimilarity, of course, 
is excluded from the overall 
percentage similarity calculation. 

In truth, when it comes to the 
human genome, even modest 
studies have to consider many 
kilobases of sequence data. 
Geneticists reduce the complexity 
of the alignment problem by 
limiting their sequence comparison 
to areas that are most amenable to 
alignment in the first place, which, 
most fortuitously, also just happen 
to be the areas of most genetic 
interest (at least prior to discovering 
the importance of non-coding, 
high-repetition regions, 

transposable elements, etc. by 
2010). There are separate computer 
programs—e.g., DUST—that 
"mask" these unwieldy, 
"uninteresting" background regions 
of the genome. 50 As mentioned 
above, masked regions are 
excluded from the overall 
percentage similarity calculation. 
But how significant is this 
exclusion? 

"Dark Matter" of the 
Genome 

To understand the scale of "low 
complexity" repetitive regions, we 
can begin by quoting, in full, a 
passage from a 2011 study 
by Koning, et al.: 


Eukaryotic genomes contain 
millions of copies of 
transposable elements (TE) 
and other repetitive 
sequences. Indeed , 

approximately half of the 
sequence content of typical 
mammalian genomes tends 
to be annotated as EEs and 
simple repeats by 

conventional annotation 
methods. By contrast , only 
about 5-10% of 

mammalian and vertebrate 
genome sequences 

comprise genes and known 
functional elements. The 

remaining 40-45% of the 
genome is essentially of 
unknown function , and is 
sometimes referred to as the 
"dark matter" of the human 
genome. The origins of this 
"dark matter" fraction of the 
genome have presumably 
been obscured , in partby 
extensive rearrangement 
and sequence divergence 
over deep evolutionary 
time. Understanding the 
content and origins of this 
huge uncharacterized 

component of the genome 
represents an important step 
towards completely 

deciphering the organization 
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and function of the human 
genome sequence. 51 

Transposable elements are DNA 
sequences that can change position 
in the evolution of the genome. 
Prior to studies like that of Koning, 
et al. and Bucher, et al., in 2011 
and 2012 respectively, which 
proved the importance of 
transposable elements, TEs were 
seen as "parasites of the host 
genome" whose only discernible 
function was to obfuscate the 
regions of the genome geneticists 
were most keen to investigate. 52 As 
we have seen, these regions were 
masked in the historical genome 
alignment studies, but as Koning, et 
ah, propose, these regions 
constitute upwards of 66% of the 
genome. Other estimates range 
from 40% 53 to 50%. 54 

What this means is that studies like 
the 2005 Chimp Consortium that 
masked repetitive regions and 
disregarded transposable areas that 
surround aligned sequences have 
excluded up to 40% of the entire 
genome in their analysis. In 2005 
and as late as 2010, these 
exclusions could be justified on the 
basis that these regions had no 
functional significance to the 
organism and to phylogenetic 
considerations generally. But, as 
we have seen, recent research 


within the past six years shows that 
such assumptions were gravely 
mistaken. 

Conclusion 

Much more can be said about the 
scientific details of gene 
sequencing and genomic 
comparison. The fields of 
bioinformatics and evolutionary 
genetics have been in a state of 
rapid development over the past 
decade and show no sign of 
slowing. As popular media report 
on these developments, it becomes 
ever more crucial for commentators 
to include caveats and context 
when translating scientific findings 
to the lay public. This care and due 
diligence will help ensure that 
scientific data is not sloppily 
misappropriated in buttressing 
ideological conclusions. 

Beyond the perils of 
slipshod reporting , a 
recurring theme in reviewing 
the comparative genomics 
literature is that the science 
itself is far from conclusive. 

For example, multiple major 
assumptions made in 2005 
by the Chimp Consortium 
study, such as the relevance 
of non-coding , repetitive , 
and transposable regions , 
were unceremoniously 
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the 99% similarity claim. But, as 
we have seen, it simply does not 
live up to the hype. 


What does all this mean for 
common descent? What is apparent 
to many specialists, as cited above, 
is that attempting to quantify 
genome similarity is ultimately a 
silly, meaningless endeavor. 
Hopefully, this essay has provided 
adequate substance to that 

conclusion. In the end, declaring 

99% similarity by itself 
hardlyfactors in favor of common 
descent, other than sheer rhetorical 
force in swaying the uninitiated. 
This does not mean that biologists 
do not have other perceived 
evidence for common descent 
(some of which will be discussed 
and evaluated in future essays in 
sha’allah). Darwin, of course, 

believed himself to have 

discovered numerous evidences of 
common descent as well, and 
without the aid of genetic analysis. 
None of these other evidences, 
however, have played a bigger part 
in the public consciousness and the 
widespread acceptance of 
Darwinian common descent than 


overturned by 2010. Yet, it 
was only through such 
selective evaluation of the 
genome guided by these 
erroneous assumptions that 
the similarity percentage of 
95% was obtained. 
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